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Abstract —Caching popular contents at base stations (BSs) can 
reduce the backhaul cost and improve the network throughput. 
Yet whether locally caching at the BSs can improve the energy 
efficiency (EE), a major goal for 5th generation cellular networks, 
remains unclear. Due to the entangled impact of various factors 
on EE such as interference level, backhaul capacity, BS density, 
power consumption parameters, BS sleeping, content popularity 
and cache capacity, another important question is what are the 
key factors that contribute more to the EE gain from caching. In 
this paper, we attempt to explore the potential of EE of the cache- 
enabled wireless access networks and identify the key factors. 
By deriving closed-form expression of the approximated EE, we 
provide the condition when the EE can benefit from caching, find 
the optimal cache capacity that maximizes the network EE, and 
analyze the maximal EE gain brought by caching. We show that 
caching at the BSs can improve the network EE when power 
efficient cache hardware is used. When local caching has EE 
gain over not caching, caching more contents at the BSs may 
not provide higher EE. Numerical and simulation results show 
that the caching EE gain is large when the backhaul capacity is 
stringent, interference level is low, content popularity is skewed, 
and when caching at pico BSs Instead of macro BSs. 

Index Terms —Energy efficiency. Cache, Wireless Access Net¬ 
works, Downlink 

1. Introduction 

T O meet the explosive demands for throughput, support 
sustainable development and reduce global carbon diox¬ 
ide emission, energy efficiency (EE) has become a major 
performance metric for 5th generation (5G) cellular networks. 
While EE of a network can be improved from various aspects 
such as introducing new network architecture [2], optimizing 
network deployment and resource allocation [3,4], an alterna¬ 
tive approach is rethinking the goal of the network. Recently, it 
has been observed that a large portion of mobile multimedia 
traffic is generated by many duplicate downloads of a few 
popular contents [5,6]. This reflects a shift in major goal of the 
networks from traditional transmitter-receiver communication 
to content dissemination. On the other hand, the storage 
capacity of today’s memory devices grows rapidly. As a 
consequence, equipping caches at base stations (BSs) offers 
a promising way to unleash the potential of cellular networks 
except continuing densifying the networks [7,8]. 
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Caching is a technique to improve performance well known 
in many wired network domains, e.g., content-centric networks 
(CCN) [9-11]. In cellular networks, caching popular contents 
in the edge can reduce the backhaul cost, access latency and 
energy consumption as well as boost the throughput. Noticing 
that backhaul becomes a bottleneck in small cell networks 
(SCNs) (and therefore in ultra dense networks (UDNs) of 5G) 
while disk size increases quickly at a relatively low cost, the 
authors in [12] suggested to replace backhaul links by equip¬ 
ping caches at the BSs. By optimizing the caching policies 
to serve more users under the constraints of file downloading 
time, large throughput gain was reported. Considering SCNs 
with backhaul of very limited capacity and caching files based 
on their popularity, the authors in [13] observed that the 
backhaul traffic load can be reduced by caching at the BSs. To 
minimize the total energy consumed by caching and by data 
transport between BSs or between BSs and servers, a policy 
of allocating cache size to BSs and service gateway (SGW) 
was optimized in [14]. To minimize total service cost, caching 
policy was optimized in [15] where the impact of multicast 
transmission was taken into account. In [16], data sharing 
among backhaul and cooperative beamforming were jointly 
optimized to minimize the backhaul cost and transmit power 
of cache-enabled systems. Eor heterogeneous networks, user 
access and content caching were jointly optimized to minimize 
the average access delay in [17], and a coded caching scheme 
was optimized to achieve information-theoretic bounds in [18]. 

Eor highly skewed demands, caches should be pushed to the 
edge, say SGW or BSs of cellular networks [13]. Compared 
with caching at the SGW, caching at the BSs creates higher 
levels of redundancy where more replicas of the same content 
are stored. Since caches also consume power, whether locally 
caching at the BSs can improve the EE of wireless access 
network still remains unknown. Somewhat related problems 
have been investigated in the context of CCN [9-11], but 
local caching in cellular networks brings new challenges. In 
CCN, the energy can be effectively saved by reducing user- 
content distances and eliminating duplicated transmissions. 
Yet in wireless access networks, duplicated transmissions over 
the air cannot be removed due to the asynchronous requests 
from the users [7] despite that caching at the BSs can reduce 
the traffic load in core and backhaul networks. Instead, in 
dense cellular networks the energy can be reduced by turning 
BSs into sleep mode with no or light traffic load [19] and 
by controlling interference. Eurthermore, many factors have 
entangled impact on the EE of wireless access networks such 
as backhaul capacity, interference level, power consumption 
parameters, BS density, BS sleeping, and user access, not to 
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mention the content popularity, cache size (i.e., cache capacity) 
and caching policy. 

In this paper, we attempt to explore the potential of EE 
in cache-enabled wireless access networks and identify the 
key impacting factors. Specifically, we strive to answer the 
following fundamental questions. 

• Will caching at the BSs bring an EE gain? If yes, what 
is the condition? 

• What is the relation between EE and cache size? Is there 
a tradeoff or does the cache size should be optimized? 

• What is the impact of network density? Where to cache 
in the access networks is more energy efficient? 

To this end, we consider a downlink multicell multiuser 
multi-antenna network. In order to show the EE gain of 
caching at the BSs over caching at the SGW (i.e., not caching 
at the BSs), we assume that the contents have been placed 
at the caches of the BSs by broadcasting during off-peak 
times, and hence we consider the energy consumed for content 
delivery but ignore the energy consumed for cache placement. 
With the aim of finding critical factors that impact the EE gain, 
we optimize the configuration in cache placement phase (i.e., 
where to cache and how much to cache) and in delivery phase 
(i.e., maximal transmit power of each BS) based on statistics 
of the user demands, where different levels of interference are 
considered. 

The major contributions of this paper are summarized as 
follows. 

• We derive the closed-form expression of approximated 
EE for cache-enabled networks, where the consumption 
of transmit and circuit powers at the BSs, and the power 
consumption for backhauling and caching at the BSs are 
taken into account. 

• We provide the condition when EE can benefit from 
caching, find the optimal cache capacity that maximizes 
the network EE, and analyze the maximal EE gain 
brought by caching. 

• We show that caching at the BSs may not improve the 
network EE. When caching brings an EE gain, caching 
more contents at the BSs may not always increase the EE. 
Both numerical and simulation results show that caching 
at pico BSs can provide higher EE gain than caching at 
macro BSs. 

The rest of this paper is organized as follows. In Section II, 
we present the system model. The EE of the cache-enabled 
access network is derived and analyzed in Section III and 
Section IV, respectively. The numerical and simulation results 
are provided in Section V, and the conclusions are drawn in 
Section VI. 


II. System Model 

Consider a downlink network consisting of Nj, BSs. Each 
BS is with Nt antennas and serves multiple users each with 
a single antenna. Each BS is equipped with a cache and 
is connected to the core network with backhaul. In order 
to understanding the potential of EE of the cache-enabled 
wireless networks and identifying the key impacting factors. 


we make the following assumptions in the analysis, which 
define a simple scenario but can capture the basic elements. 

• We use circle cells each with radius D to approximate 
hexagonal cells for easy analysis. 

• Each content is of equal size F bits as in [10,12,20] for 
mathematical tractability and notational simplicity.' 

• The content popularity distribution changes with time 
slowly [12] so that can be regarded as static and the 
energy consumption for refreshing the cached content 
can be safely neglected. Specifically, we consider a static 
content catalog that contains Nf contents, ranking from 
the most popular (the 1 st content) to the least popular (the 
Nfth content) based on the popularity. In practice, Zipf- 
like distribution is widely applied to characterize many 
real world phenomena. Assume that each user requests 
one content from the catalog, and the probability of 
requesting the /th content is [21], 

/-' 


Pf = 




;-<5 


( 1 ) 


where the typical value of 6 is between 0.5 and 1.0, which 
determines the “peakiness” of the distribution [22]. Since 
6 reflects different levels of skewness of the distribution, 
it is called skew parameter. 

• The spatial distribution of the users is modeled as ho¬ 
mogeneous Poisson point process (PPP) [23,24] where 
the average number of users in the whole network is X} 
Then, the probability that there are K users in each cell 

is 

• Each user is associated with the closest BS,^ which 
is called its local BS, and each BS caches Nc most 
popular contents. In fact, with the static content catalog, 
when each user is associated with its local BS and the 
users’ requests are with identical distribution, caching 
most popular contents everywhere is the optimal caching 
strategy in terms of maximizing the cache hit ratio [7]. 

• Each BS serves the associated users with zero-forcing 
beamforming (ZEBE), which is a widely-used precoder 
to eliminate multi-user interference [26], and with equal 
power allocation among multiple users 

Denote Ci, = {1,2, ■ ■ ■ , N^} as the set of the contents 
cached at the 6th BS (denoted by BS;,), 6 = 1, • • • , A),, then 
the cache capacity of each BS is NcF. When a user requests 
a content that is cached at its local BS, the BS will fetch the 
content from the cache directly and then transmit to the user. 


’when the content size is random, we can show that the performance 
depends on the average content size, and the main results do not change. 

^When this assumption does not hold, say, if the users are distributed within 
hotpot areas, the network EE will become lower due to stronger interference. 
Nonetheless, the main results still hold. 

^User association based on instantaneous channel gain will cause unneces¬ 
sary handovers (i.e., the so-called “ping-pong effect”) [25]. For mathematical 
tractability, we do not consider shadowing, which will not change the main 
trends of the performance. 

“^Optimizing power allocation is rather involved in the considered setting 
with limited-capacity backhaul. Moreover, the closed-formed expression even 
for an approximated EE with optimal power allocation is hard to obtain if 
not impossible. Equal power allocation provides an EE lower bound, which 
however can reflect the main trends of the EE and becomes near optimal when 
signal-to-interference-plus-noise ratio (SINR) is high. 
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Otherwise, the BS will fetch the content from the core network 
via backhaul link. 

To reduce energy consumption and avoid interference, we 
consider BS idling ranging from very short period (less than 
1 ms) to longer period (e.g., 100 ms) [19]. Once a BS has no 
user to serve, the BS is turned into idle mode. Otherwise, the 
BS operates in active mode. The probability that BSh is active 
is Pa = 1 ~ according to the spatial distribution of 

users. Since we do not restrict the type of caching hardwares 
where some of them can not be switched off when contents are 
cached (e.g.. Dynamic Random Access Memory (DRAM)), we 
do not consider cache idling.^ 

The network EE is associated with the throughput, which 
largely depends on the interference level. To capture the 
essence of the problem and simplify the analysis, we introduce 
a parameter to reflect the portion of inter-cell interference (ICI) 
able to be removed in a network, ranging from the best case to 
the worst case, as detailed later. When the user density is high 
such that the number of users in a cell exceeds Nt, we can 
select several users to serve according to a certain criterion. 
When round-robin scheduling is used to select Nt users to 
serve, the probability that BSh serves Kj, users can be derived 
as 


PKt 


\nJ KtN > 

1 _ ( Jl)'" Ae“ 

Z^/c=0 \Ni,) fc! 


if Kt, < Nt 
if Kt, = Nt 


( 2 ) 


The probability for other user scheduling can also be derived, 

which is not show n for conciseness^_ 

Denote Hf, = [V• • • as the downlink 

channel matrix from BS{, to the Kt, users located in the &th cell, 
where r^b and h.kb are respectively the distance and the small- 
scale Rayleigh fading channel vector from BSf, to the fcth user 
(denoted by MS^), and a is the path-loss exponent. When 
perfect channel is available at each BS, the ZEBE vector at 
BSf, can be computed as Wj, = ■ ■ ■ , "^Rkb], where 

Wfct, = Wfch/llwfchll, Wfct, denotes the fcth column vector of 
(•)'!', {■)^ , and || • || stand by the Moore-Penrose 
inverse, conjugate transpose, and Euclidean norm, respectively. 

Then, the instantaneous receive SINR of MSfc served by 
BSf, when the BS is active is 


KbiPPh + a'^) 


(3) 


where 4 = is the power of ICI 

normalized by the transmit power P at BS, Q is an indicator 
for the status of BSj, Q = 1 if BSj is active, Q = 0 
otherwise, is the variance of the white Gaussian noise, 

and /? G [0,1] reflects the percentage of how much ICI can be 
removed by some sort of interference management techniques. 
Eor example, /? = 0 reflects the optimistic scenario, where all 
ICIs are assumed to be completely eliminated. /3 = 1 reflects 
the pessimistic case, where no interference coordination is 
assumed among the BSs. 


^Some cache hardwares such as hard drive disk (HDD) or solid state disk 
(SSD) can be switched off without losing the cached contents. When a BS is 
turned into in deep sleep (e.g., with period in hours), these cache hardwares 
can be switched off to further reduce energy consumption. 


Considering that the requested contents not cached at BS;, 
need to be fetched via backhaul and the backhaul traffic load 
is constrained by the backhaul capacity, the instantaneous 
downlink throughput of the 6th cell can be expressed as 

Rb = Cb(B ^ log2(l+7fc&) 

^ fk&Cb 


+ min (^B E log2(l + Ikb), Cbh) j (4) 

ftb,bh 

where fk denotes the index of the content requested by MSfc, 
B is the downlink transmission bandwidth, Cbh is the backhaul 
capacity, and the min(x, y) function returns the smallest value 
between x and y. 

The first term Rb^ca in (4) is the sum rate of the users in 
the 6th cell whose requested contents are cached at the BS, 
called cache-hit users. The second term is the sum rate 

of the users whose requested contents are not cached at the 
BS, called cache-miss users. 


III. EE OF THE Cache-Enabled Network 

The EE of the downlink network is defined as the ratio of 
the average number of bits transmitted to the average energy 
consumed [27-29], which is equivalent to the ratio of the 
average throughput of the network to the average total power 
consumption at the BSs 


E 


A R 

e| 

Ytb=l -ffc.BS 

1 Ptot 


where the expectations are taken over small scale fading, user 
location and the number of users in the network,® and Pb, bs 
is the total power consumed at BSf,, which will be detailed 
later. 

In the following, we first derive the average throughput, and 
then derive the average total power consumption, from which 
we can obtain the EE of the network. 


A. Average Throughput of the Network 

Since the system configuration, caching and transmission 
strategies of every BS are the same and the users are uniformly 
located, the average throughput of the network can be obtained 
as 

^ = ®|E^'>| = ( 6 ) 

and the average throughput of the 6th cell can be expressed as 

Nt Kb 

E{^4= E Y. PiKb.KMRb\{Kb,K,)} (7) 

Kh — ^ Kc—0 


^In this paper, unless otherwise specified, the expectation operator E{*} is 
taken over all random variables (RVs) inside “{•}”. 
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where P(Kb,K^) denotes the probability that Kh users are 
served by BSf, meanwhile Kc of them are cache-hit users, 
and Kc)} is the average throughput of the &th 

cell under the condition that Kj, users are served by BSf, 
meanwhile Kc of them are cache-hit users. 

Using the conditional probability formula, we have 
P(Kt,K^) wherepK, is given in ( 2 ), ^.nApK^\Kt 

denotes the probability of Kc users requesting the contents 
from local cache under the condition that BS;, serves Ki, users, 
which can be expressed as 

Pk,\k,= (8) 

where ph is the probability that fk € Cb (i.e., the cache hit 
ratio), which can be obtained from the Zipf-like distribution 
probability in ( 1 ) as 

tVc f-5 

Pf = N, ._s (9) 

/=! Ejil J 

Without loss of generality, we assume that the contents 
requested by MSi,---, are cached at BSf, and the 

contents requested by MS^^+i, • • •, are not cached at 

BSf,. Then, from (4), the conditional expectation of the average 
throughput of the 6 th cell is given by 


E{Rb\{Kb,Kc)} = RUKb,Kc) + i?bh(ifb, Cbh) (10) 

where Rce,{Kb,Kc) = log 2 (l_+ 7 /c 6 )} is the aver¬ 

age sum rate of the cache-hit users, and i?bh (^6 7 Kc, Cbb) — 
E{ min {B J 2 k=K,,+i log2(l+7fe6)7 Cbh)} is the average sum 
rate of the cache-miss users. 

To obtain a closed-form expression of EE for further 
analysis, we derive the approximated Rca{Kb,Kc) and 
Rhh{Kb, Kc,Cbh) in the following two lemmas. 

Lemma 1: The average sum rate of the cache-hit users can 
be approximated as 


RUKb,Kc)-KcB 


a 


(Nt -Kb + 1)P 


21n2 Kb{pal3P2‘^ + D^a'^) 


^Kc 


aB 
2 In 2 


-RoiKb) 


( 11 ) 


where <!> is a constant only depending on the path-loss expo¬ 
nent a when Nb -t oo, Rc{Kb) = Blog^ Kb[p!pP 2 ^'+D^^G'^) 
can be regarded as the average achievable rate of a cell- 
edge user when BSf, serves Kb users under unlimited-capacity 
backhaul. 

Proof: See Appendix A. ■ 

The approximation of Rca{Kb, Kc) is accurate when both 
SINK and ^ are high. 

Lemma 2: The average sum rate of the cache-miss users 
can be approximated as 


Phh{Kb, Kc, C^bh) ~ 

r iKb-Kc){f}K^^Kb-Kc+l, z) + RciKb)-fiKb-Kc, z)) 
I + Cbhr(X 6 - Kc, z), if Cbh > {Kb - Kc)Rc{Kb) 
I Cbh, otherwise 


where 2 4 ^(Cbh - {Kb - Kc)Rc{Kb)), r(fc,x) ^ 

e-" Eto if^ and 7(fc, x) ^ 1 - e- Ef=o if ■ 

Proof: See appendix C. ■ 

The approximation is accurate in high SINR region when 
is high and Nt,Nb —t oo. 

Substituting (10) into (7) and then into (6), we obtain the 
network average throughput as 

Nt Ki 

R=Nb ^ '^^Pl^bPK^lKb {Rca.{Kb, Kc)+Rhh{Kb, Kc, Cbh)) 

(13) 

where px^ is given in (2), PK,;\Ki is given in (8), and the 
approximations of Rca{Kb,Kc) and Rb,\,{Kb, Kc,Chh) are 
given in (11) and (12), respectively. 

B. Average Total Power Consumption 

To gain useful insight, we consider a basic model for such 
cache-enabled networks capturing the fundamental challenges 
and tradeoffs. By extending the typical BS power consumption 
model in [30] to include caching power consumption, the total 
power consumed at BSf, can be modeled as follows, 

Pb, BS = pPbM 3" Rb,cc + Pb,ca. + Cf, bh (14) 

where Pb,tx, Pb,cc, Pb,ca, and Pf,^bh respectively denote the 
power consumed at BSf, for transmitting, operating the base¬ 
band and radio frequency circuits, caching, and backhauling, 
and p reflects the impact of power amplifier, cooling and power 
supply. 

The transmit power of BSf, is Pf, tx = P when the BS is 
in active mode or Pf, tx = 0 when the BS is idle. The circuit 
power is Pb,cc = Pcc^ ir* active mode or Pcc- in idle mode. 
Since the active status of the BSs are independent from each 
other, the total number of active BSs in the network (denoted 
by No) follows Binomial distribution, and hence E{iVo} = 
NbPa = Nb{l — e ”!>). Therefore, the average total transmit 
and circuit power consumption at all BSs is 
r Nb 

E < pPf,,tx + Pb,cc 

U=1 

= E{Na}{pP + PccJ + {Nb - E{Na})Pcc, 

= Nb{l-e-^^)Pa + Nbe-^R (15) 

where Pa = pP -I- Pcc„ and Pi = Pea are the total transmit 
and circuit power consumptions at a BS in the active mode 
and idle mode, respectively. 

Energy-proportional model is widely used in CCN [9-11] 
as well as radio access network (RAN) [14], which enables the 
efficient use of caching resources. In this model, the caching 
power consumption is proportional to the cache capacity, 
which can be expressed as Pb,ca = WcaBca [9], where Pea 
is the number of bits cached at BSf,, and Wca is the power 
coefficient of caching hardware in watt/bit. Since the cached 
contents of each BS are fixed, when each BS caches Nc 
contents, the average total caching power consumption of all 
BSs is 

Nb 'I 

^Pfc.ca > = NbPb,ca = NbWcaNcF ( 16 ) 

6=1 J 



( 12 ) 
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The backhauling power consumption at BS{, is modeled as 
[31] po p 

^b,bh = = t«bhi?h,bh (17) 

^bh 

where denotes the power consumed by the backhaul 
equipment when supporting the maximum data rate 
w^bh — T^bh/^bh 1® ’■1*® power coefficient of backhaul equip¬ 
ment, and Rb.hh is the backhaul traffic, i.e., the sum rate of 
cache-miss users as defined in (4). Then, the average backhaul 
power consumption is 

E T),,bh| = WbhE 7?b^bh| = Wbh-/Vt,E{_Rb,bh} 

(18) 

Similar to the derivations for (7) and (10), we can derive that 


Nt Kb 

E{i?6,bh} = E E P{Kb,K^)^{Rb,hh\iKb, Kc)} 

Kb — 1 Kc—0 
Nt Kt 

= E E PKb ■ PK,\KbRhhiKb,Kc,Cbh) (19) 

Kb = \ K^=0 

Then, the average total power consumption at all the BSs 
is 


Ptot = iVb ^ (^1 - e ">> j Po + e "!> Pi -f Wce,NcF 

Nt Kb _ \ 

■U'bh E E PKbPK, I Kb Phh iKb,Kc,Chh) I (20) 

Kb = l K^=0 / 


C. EE of the Network 

By substituting (13) and (20) into (5), the EE of the network 
can be obtained as (21). With the approximated Rca{Kb, Kf) 
and R\y\t{Kb, Kc, Cbb) introduced in the two lemmas, it is of 
closed-form and becomes an approximated EE. 

Despite that the approximated EE is in closed form, it is still 
complex for further analysis. To gain useful insight on how 
caching impacts the network EE, in the sequel we analyze a 
special scenario where each BS selects one user in each time 
slot from the associated users [24,32]. 

IV. EE Analysis for the Cache-enabled Network 

In this section, we analyze the impact of several key factors 
on the EE and reveal their interactions for a special case when 
each BS serves at most one user in each time slot.^ 

In this case, the average throughput of the network in (13) 
degenerates into, 

R = NbPa (pb Pea + (1 - P/i ) Pbh) (22) 

where Pea and Pbh are respectively the approximate average 
achievable rate of cache-hit user and cache-miss user derived 
from (11) and (12) as 

- (xB - 

Rea « + Pe (23) 

2 In 2 

^This can be also regarded as a special case where no more than one user 
exists in each cell. 


Pbh 


fCbh, 

I oH 
[ 21n2 


+ Po ~ 


OcB <ry 

2 In 2^ 


2(Cbh--Re) 

OiB 


if Cbh < Po 

, . (24) 

otherwise 


and Pe = Plog 2 is given by (11). 

Remark 1: The average throughput of the network in¬ 
creases with the cache hit ratio ph and the backhaul capacity 
Cbh- In other words, we can improve the throughput by 
caching more contents and increasing backhaul capacity. When 
Cbh is low and the contents are not with uniform popularity 
(i.e., 6 > 0), the throughput increases with the cache size 
first rapidly then saturates, i.e., there is a tradeoff between 
throughput and memory. 

The backhauling power consumption in (18) degenerates 
into 


E 


Nb 

E^bh 


WbhNbPai^ -Ph)Rhh 


.6=1 


f WbhP^6Pa(l -P6)Cbh, 

\ Wh\tNbPa{^ - Ph){R ca 


if Cbh < Po 
if^2~ ^ otherwise 

(25) 


which decreases with ph but increases with Cbh- 

Substituting (22), (25), (15) and (16) into (5), the EE of the 
network can be approximated as, 

Pa{phRca P Ph^Rhh) 

h/h/ ~- - --- - -—^— 

PaPa + (1 - Pa)Pt + WcaNcF + PaWbhfl “ Ph)Rhh 

(26) 

where PaPhRca Pa{^ —Ph)Rhh are the average sum rates 
of the cache-hit and cache-miss users of each cell, PaPa + 
(1 -pa)Pi, WcaNcF and PaWbh(l -Ph)Rhh are the average 
powers consumed for transmission and circuits, caching, and 
backhauling of each BS, respectively. 

Given that the caches in the network somewhat play a role 
of replacing the backhaul links, and the transmit power affects 
both the throughput and the total power consumption, the 
cache capacity NcF, backhaul capacity Cbh, and the transmit 
power of each BS P have an interactive impact on the EE. In 
what follows, we separately analyze the relation between the 
network EE and cache capacity or transmit power for a given 
backhaul capacity. To simplify the analysis, we only consider 
the case where <5 = 1 in the following. The impact of other 
values of <5 will be evaluated later by simulations. 


A. Relation Between Network EE and Cache Capacity 

With given backhaul capacity and transmit power, we first 
answer the following question: whether caching at the BSs 
can always improve the network EE? 

Proposition 1: When the following condition holds. 



{PaPa + (1 - Pa) Pi)+PaWh\tRca 


(27) 

caching can improve the network EE. Otherwise, caching can 
not improve the EE. 

Proof: See Appendix D. ■ 

To help understand this condition, we consider two extreme 
cases in the following corollary. 
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EE = 




—QPKbPKc\Kb (-^ca(^6i-^c) H“ ^bh-^cj C^bh)) 


(l-e 


Pa+ e ™l> Pj + WcaNcF + Wbh J2kI = 1 J2kI= 0 PKbPK^\KbPhhiKb, Kc, Cfah) 


Kb 


( 21 ) 


Corollary 1: When Cbh = 0, caching at BSs can always 
improve the network EE. When Cbh —> oo, the condition in 
(27) becomes, 


PaWhhRc 

WcaF 


Nf 

>Er 

i=i 


In Nf 


(28) 


Proof: When Cbh = 0, it is easy to see that (27) always 
holds. When Cbh —oo, it is shown from (23) and (24) 
that lim Pbh = Rea- Then, by substituting Pbh = Rea 

Ohh —^OO 

and using = e + InTV/ + 0{^) with £ ss 0.577 

as the Euler-Mascheroni constant, (27) becomes (28) and the 
approximation is accurate when Nf ^ 1. ■ 

Remark 2: In (28), PaWhhRca is the average backhaul 
power consumption of each BS without caching, and WcaF 
is the average cache power consumption of each BS when 
only the most popular content is cached at each BS. This 
suggests that whether caching benefits EE largely depends on 
the power consumption parameters for the cache and backhaul 
hardwares. 

In what follows, we consider the scenario where the con¬ 
dition holds, and strive to answer the second question; what 
is the relation between maximal EE of the network and the 
cache size? To this end, we first provide the cache hit ratio 
Ph for large values of Nc and Nf. Ho reflect the impact of 
the content catalog size Nf, we analyze a normalized cache 
capacity p = N^/Nf, p € [0,1]. Then, from (9) we can derive 

_ e + IniVc + InTVc _ Inp 

=s + lniV/ + (!l(^)^h^= +h^ 

(29) 

where the approximation in (29) is accurate when Nc ^ 1 
and Nf ^ 1. 

By substituting (29) into (26), we can approximate the 
network EE as 

Pa (Pbh + (Pea — Pbh) (l + ln”v, ) ) 

PPW-- (30) 

PaPa + {^-Pa)Pz+Wcar]^fF-PaWhhRhhf^ 


Denote W (x) as the Lambert-W function satisfying 
W{x)e^^^'^ = X. Then, the relation between EE and cache 
capacity is shown in the following proposition. 

Proposition 2: The solution of the equation Irj-rjo ~ ^ 


IS 


Vo 


n 


NfW 


{ 1 I „ 

ile 



(31) 


where 




(32) 


When po < 1^ the EE-maximal normalized cache capacity is 
p* = pq. When po > 1, p* = 1. 


Proof: See Appendix E. ■ 

Remark 3: If po < 1, the EE will first increase and then 
decrease with the cache capacity. Otherwise, if pq > 1, the EE 
will be maximized when all contents in the catalog are cached 
at each BS, i.e., there is a tradeoff between the maximal EE 
and the cache size. 

To understand when the EE-memory tradeoff exists, we 
rewrite (31) in a form of as 


Vo = 


fie 


1 _ -^bh 

e Rca-Rhh 


In Nf 


Nf 


(33) 


As shown in (32), O increases with the average power 
consumed for transmission and circuits at each BS PaPa + {^ — 
Pa)Pi and the backhaul power coefficient zubh, and decreases 
with the content size F and cache power coefficient Wca- 
Eurther considering that is an increasing function of 

X [33], ? 7 o increases with PaPa + (1 — Pa)Pi and Wbh, and 
decreases with F and Wca- Moreover, it is shown from (31) 
that Pq increases when the content catalog size Nf decreases 
since W{x) an increasing function of x [33]. 

Remark 4: po > 1 for the systems with high transmit 
power, large circuit and backhauling power consumptions, 
power-saving caching hardware, small content size F and 
small catalog size Nf. Otherwise, po < 1, where caching more 
contents is not always energy efficient. 

To further identify the key impacting factors on network 
EE and gain useful insight on network configuration, in what 
follows we consider the case when backhaul capacity is 
unlimited. 

1) An Extreme Case of Cbh —>■ oo.' In this case, 

lim 7?bh = Rea- Then, the network EE in (30) can be 

Cbh— 

expressed as 


EE : 


PaRc 


Pa Pa + (1 - Pa ) P + Wca P - Pa Wbh Pc 

PaPea 


PaPa + (l~Pa)Pi!+ Pea + Pbh 


(34) 


Remark 5: In (34), only the powers consumed for caching 
and backhauling depend on p. Because Pea increases with p 
linearly, while Pbh decreases with p first rapidly and then 
slowly, the total power consumption first increases and then 
decreases with p. Hence, the relation between network EE and 
cache capacity relies on the trade-off between backhauling and 
caching powers. 

Erom (34) and considering the expression of Pea in (23), 
we obtain the following corollary. 

Corollary 2: When Cbh —oo, the solution of the equation 



7 


dEE 

drj ri=rio 


= 0 is 


m =Pa ■ 


Wbh 


f'N f\nN\2\n2 + p 


, -1 


D°<t2 


(35) 


where $ is the constant only depending on a, and is 

the average cell-edge signal-to-noise-ratio (SNR). 

Remark 6: As shown in (35), po increases with Nt and 
P. This suggests that BS with larger number of antennas and 
transmit power should cache more to achieve the maximal EE. 

According to Proposition 2, when r/o > 1, there exists a 
trade-off between EE and ry. Considering that y = xh\x can 
be rewritten as a: = from ryo > 1 and (35) we can 

obtain the following corollary. 

Corollary 3: When Cbh —> oo, there exists a trade-off 
between EE and rj if Nj < Nth, where 


Nt 


th 


w(p 

= e V 


■ ( 2 In 


f+log. 


2 pa/32^ + (P/D«er2) 


-)) 


= 


(36) 


Remark 7: As shown in (36), when the average cell-edge 
SNR is high, the interference level /? dominates the value of 
Rea- If the interference can be reduced to a low level, R^a will 
increase and the value of Nth will be large, and then the EE- 
memory trade-off will exist even for a large content catalog 
size. 

Again according to Proposition 2, when po < the EE 
optimal normalized cache capacity is ry* = rjQ. Erom (35), we 
can further analyze the impact of network density. 

Corollary 4: When Cbh —> oo, for a given total coverage 
area of the cells Ni,TrD'^, rj* = ryo decreases with Nf,, and W^ry 
increases with Nh for ^ 0. 

Proof: See Appendix E. ■ 

Remark 8: Corollary 4 indicates that when the network 
becomes denser, each BS should cache less contents but the 
total cache capacity of the network should increase in order 
to maximize the network EE. Eurther considering that po 
decreases with Nt and P as mentioned in Remark 6, this 
implies that a pico BS should cache less contents than a macro 
BS to achieve the maximal EE. 

Since (35) gives the optimal cache capacity maixmizing 
the network EE when po < we can further analyze the 
impacts of different factors on the maximal EE gain brought 
by caching. 

Corollary 5: When Cbti —> oo and rjo < 1, the gain of 
maximal EE with caching over that without caching is 

PPgain = Q (37) 

where 


■pa-ricaWbh. 

Proof: By substituting (35) into (34), we can obtain the 
maximal EE denoted as EEmax- Denoting the network EE 
without caching (i.e., Nc = 0) as EE-ao, we can obtain the 
maximal EE gain with caching over that without caching as 
PPgain — ^ee^^ ’ which Can be written as (37). ■ 


' 1„ PgmbhPca _ 1 
I JUoaP’ In Nj 


Pa + (l-Pa)Pi 


+ 1 


(38) 


Remark 9: As shown in (38), G increases with Rea since 
the numerator increases with Rea while the denominator 
decreases with R^a- This implies that the EE gain of caching 
at the BSs can be improved significantly by mitigating ICI 
because the value of Rea largely depends on the interference 
level P as we mentioned before and PPgain increases with 
G. We can also see from (38) that G increases when the ratio 
of total transmit and circuit power to the backhauling power 
without caching (i.e., decreases. This implies 

that caching at the pico BSs may provide higher EE gain than 
caching at the macro BSs since backhaul power consumption 
usually takes a larger portion of the energy in the pico cells 
[34]. 

When rjo > 1, the results are similar and the conclusion is 
the same. 


B. Relation Between Network EE and Transmit Power 

When the backaul capacity is unlimited, by substituting Rea 
in (23), and Pa and Pi in (15) into (34), the network EE can 
be expressed as a function of transmit power P as (39). 

Corollary 6: When Cbh — > oo and the network is interfer¬ 
ence limited, i.e., paPP2'^ ^ the EE decreases with 

the transmit power P. 

Proof: Since paPP2‘^ ^ D°‘a^, by omitting the term 
D°'a^ in (39), we can see that EE decreases with the transmit 
power P. ■ 

Corollary 7: When Cbh oo and the network is noise 

limited, i.e., paPP2'^ <C the EE first increases and then 

decreases with the transmit power, and the optimal transmit 
power maximizing the EE is 


Pn = 


(Pcc + Pea) 




(Pcc+EP) “ — 1 

pD°<t2 


(40) 


where Pee = PaPcc^ + {f—Pa)Pcci is the average circuit power 
consumption of each BS, and Pea = WeaVNfF is the average 
cache power consumption of each BS. 

Proof: See Appendix G. ■ 

Remark 10: As shown in (40), Pq increases with Pea since 
increases with x. This means that the transmit power 
should increase with the cache capacity to maximize the EE. 

We can show that the EE is not joint concave in ry and P, 
despite that the EE is an unimodal function respectively of ?y 
and P when the network is noise limited. Therefore, the point 
(Pq, ryo) satisfying = 0 in (40) and 4^ = 0 in (35) may 
not be joint optimal. In the next section, we provide numerical 
results to show that (Pq, ryo) is joint optimal in the considered 
system setup. 

When the backaul capacity is very low, i.e., Cbh —> 0, 
almost the same results and conclusion can be obtained, which 
are not shown for conciseness. 

Erom previous analysis in this section, we can draw the 
following conclusions. 

• If the backhaul capacity is unlimited, then the average 
throughput of the network will not change no matter 


*This condition can be rewritten as /3 ^ ^ p , which is /? 

0.015 for Pa = 0.8 and 20 dB cell-edge SNlf.“ 



(39) 


EE 




NtP 


2 p„/3P2«'+D“ct: 


0 


NtP 


Pa{pP + -PccJ + {i-Pa)PcCi + WcaNcF + PaWhhB{^ - Ph) (2^ + ^Og: 


2 p„/3P2'*'+£l“cr2 


whether each BS is equipped with cache. If the back¬ 
haul is with limited capacity, there will exist a tradeoff 
between throughput and memory. 

• Whether caching at the BSs brings an EE gain depends 
on the backhaul capacity, and the power consumption 
parameters of the cache and backhaul hardware. 

• If the backhaul capacity is unlimited, the EE gain of 
caching will come from trading off the backhaul power 
consumption with the cache power consumption. If the 
backhaul capacity is limited, the caching gain will come 
from both the increase of network throughput and the 
decrease of backhaul power consumption. 

• When the content catalog size is small, there is a tradeoff 
between EE and memory. Otherwise, the cache size of 
each BS should be optimized to maximize the EE of the 
network. 

V. Numerical and Simulation Results 

In this section, we validate the analysis and evaluate the 
EE of the cache-enabled networks. We show when caching at 
BSs has EE gain and how much gain we can expect in real 
systems. 

While in the derivation we have assumed circle cells, in the 
simulation we consider a hexagonal region with radius 250 m. 
To demonstrate the impact of interference, we deploy three 
tiers of hexagonal pico cells in the region. Then, Nh = 37, 
and the radius of each pico cell is D = « 40 m. In 

order to remove the boundary effect, we deploy three more 
tiers of cells to ensure that every cell is surrounded by no 
less than three tiers of cells. Each pico BS is equipped with 
four antennas, and the transmission bandwidth is set as 20 
MHz. The noise power is set as cr^ = —95 dBm and the 
path-loss model is 30.6 -f 36.71ogig(rfc(,) in dB [35].® The 
catalog contains Nf — 10^ contents each with size of F = 30 
MB (MegaByte) [7]. Recall that the EE analysis in Section 
IV is obtained for a special scenario where each BS serves 
at most one user. To show that the analytical results are also 
true for more general scenarios, in the following, each BS can 
schedule at most Nt users with ZEBE. The user distribution 
in the whole network follows PPP and the average number of 
users in the network is A = 30. Then, the ratio of user density 
to BS density is ^ « 0.8.'° The popularity of the contents 
follows Zipf-like distribution with typical parameter S = 0.8 
[38]. The power consumption parameters of the system are 
p = 15.13, P = 21 dBm, Pec. is 3.85 W, Pcc„ is 10.16 W 
for typical pico BS [27], Whh = 5 x 10”'’ J/bit for microwave 

®In practice, the propagation environment may change and the line of sight 
(LoS) paths may exist between BS and user with a certain probability. In this 
scenario, the EE will reduce due to stronger ICI but the EE-cache size relation 
will not change. 

^^The ratio of user density to BS density is typically around one for SCNs 
[23,36] and is much smaller than one for future UDNs in 5G [37]. 


backhaul link [31], and Wca. = 6.25 x 10”'^ W/bit for high¬ 
speed SSD [9]. Unless otherwise specified, the above setups 
will be used for all simulations and numerical results. 


A. Validation of the Analysis 

To validate the assumption that the energy consumption for 
content update is negligible when content popularity changes 
slowly, we estimate the energy consumption for updating 
contents. Suppose that u percent of the cached contents are 
updated at interval T. Then, the percentage of energy con¬ 
sumption for content update to the total energy consumption 
during T is 

uNbNttEwbh 


Eud = 


TPu 


(41) 


where uN^NcF is the total number of bits conveyed through 
backhaul links and thus uNtNcFwhh is the energy consumed 
for updating contents. Considering that the popularity of many 
contents changes slowly," we set u = 10% and T = 12 hours. 
Then, when Nc = 10°, Fud < 3%. 

To validate the approximation made for E{log2(/3/fc -f ^)} 
in Appendix A, we compare the simulation results of this term 
with the numerical results of its approximation given in (A.5) 
in Eig. 1. Since the term depends on pa = 1 — e and /?, the 
results for different values of ^ and f3 are provided. We can 
see that the simulation and numerical results almost overlap 
for all values of (3 G [0,1] especially when is high, i.e., 
the approximation is accurate. 
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Fig. I. The accuracy of the approximation of E{log2(/3/fc -I- ^)}. 


To validate the approximation introduced in (C.l), we com¬ 
pare the simulation results of the average throughput per cell 
with the numerical results obtained from (13) versus Cbh in 
Eig. 2(a). We can see that the simulation and numerical results 
almost overlap, i.e., the approximation is accurate, although 
Nt = 4 and TVf, = 37 that are far from infinity. To show 

' * For example, new movies are posted (or change popularity) every week, 
and new music videos ai'e posted about eveiy month [12]. 
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(a) Average throughput versus Cbh. V = 0.1. 



(b) Average throughput versus rj, Cbh = 100Mbps. 


Fig. 2. Average throughput versus backhaul capacity and cache capacity. 

the impact of caching on the throughput of the network, we 
also provide the numerical results obtained from (13) versus 
T] in Fig. 2(b). We can see from Fig. 2(a) and Fig. 2(b) that 
the throughput increases with both the backhaul capacity and 
cache capacity, which agrees with the result in (22) derived 
in the special scenario. Moreover, the throughput increases 
with r] more sharply when j3 is small. This suggests that the 
throughput can be boosted more efficiently by caching at the 
BSs if the ICI level can be reduced. 

B. When EE Benefits from Caching? 

In Table I, we use numerical results to show when the 
condition in (27) holds for different content catalog size TV/, 
backhaul hardware and cache hardware. 

A typical pico BS in LTE system is considered, where the 
transmission and power consumption parameters have been 
defined in the beginning of this section. The interference level 
is set as /3 = 1. In such a worst case, the condition is more 
prone to be invalid. While there are various kinds of memory 
technologies, we consider the two kinds that are most likely 
employed due to their higher power efficiencies and larger 
cache sizes. Except for the high speed SSD cache hardware 
with Wca. = 6.25 X 10“^^ W/bit and microwave backhaul link 


with uibh = 5 X 10“^ J/bit, we also consider DRAM as cache 
hardware and optical fiber as backhaul link (with capacity 1 
Gbps), whose power coefficients are respectively Wca. = 2.5 x 
10"® W/bit [9] and Wbh = 4 x 10"® J/bit [9,14]. Considering 
that N f has a wide range in literatures, e.g., Nj = 10 ^ ~ 10 ® 
with a large content size F = 10^ ~ 10® MB [39,40] and 
Nf = 10"^ ~ 10® with a small content size F = 1 ~ 10 MB 
[12,41], we also investigate the impact of Nj and F on the 
condition. 

TABLE I 

Numerical Example, 5 = 1 


Condition 

(27) 

LHS 

RHS 

Wca. 

"'bh 

Nf 

F 

Hold 

0.006 

34.4 

SSD 

microwave 

10® 

10 MB 

Hold 

0.006 

2.31 

SSD 

optical fiber 

10® 

10 MB 

Hold 

0.37 

2.31 

SSD 

optical fiber 

10® 

10® MB 

Hold 

2.41 

34.4 

DRAM 

microwave 

10® 

10 MB 

Not hold 

2.41 

2.31 

DRAM 

optical fiber 

10® 

10 MB 

Not hold 

149.7 

34.4 

DRAM 

microwave 

10® 

10® MB 


As expected, when the values of Wca is large and Wbh 
is small, the EE does not benefit from caching at the BSs. 
Moveover, with the same value of NfF, the condition is more 
prone to be invalid when the content size F is large. 

C. Impact of Key Parameters on EE 

In Eig. 3, we show the numerical results of EE obtained 
from (21) respectively versus backhaul capacity and normal¬ 
ized cache capacity. We can see from Eig. 3(a) that when no 
content or a little portion of the contents are cached at each 
BS (i.e., rj — 0 and 0.001), EE increases with the backhaul 
capacity, and when the portion is large (i.e., rj = 0.01, 0.1), EE 
decreases with Cbh- This is because although the throughput 
increases with Cbh, the backhaul power consumption also 
increases with more backhaul traffic. Moreover, the EE gain of 
caching over not caching is high when the backhaul capacity 
is very limited, and the gain approaches a constant when Cbh 
is large, say 200 Mbps. Eig. 3(b) shows that when the catalog 
size Nf is relatively small (i.e., Nj < Ntjf), say Nj = 5000, 
EE increases with p until all contents are cached, and the 
maximal EE gain of caching over not caching is about 575% 
when P — 0 and 250% when /3 = 1. When Nf is large 
(i.e., Nf > Nth), EE first increases and then decreases with 
T], In fact, we can compute the values of Nth from (36) 
for unlimited-capacity backhaul or numerically from (31) for 
limited-capacity backhaul. In the considered setting, the values 
of Nth range from 3000 to 20000 contents. Note that these 
results are obtained when each BS can schedule at most Nt 
users. Nonetheless, the results are consistent with the analysis 
in Section IV-A and Proposition 2, which are obtained in the 
special case where each BS only serves at most one user. By 
comparing Eig. 3(b) with Eig. 2(b), we can see that the EE gain 
from caching is much higher than the throughput gain from 
caching if ICI can be perfectly controlled (i.e., /3 = 0). This 
is because when backhaul capacity is limited, the throughput 
gain of caching only comes from reducing ICI, but the EE gain 
also comes from reducing the proportion of power consumed 
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(a) EE versus backhaul capacity, /3 = 0.5. 

X 10° 



Fig. 3. EE versus backhaul capacity and cache capacity. 



(a) EE versus r) under different skew parameter 5. 



(b) EE versus under different cache capacity rj. 


Fig. 4. EE versus cache capacity and user density, /3 = 0.5, Cbh = 100 
Mbps. 


for backhauling. To show the impact of shadowing, we also 
provide the simulation result of EE in Eig. 3(b), where the 
shadowing is subject to log-normal distribution with 8 dB 
deviation. We can see that the network EE is slightly lower 
when shadowing is considered but the main trend of EE-cache 
relationship does not change. 

In Eig. 4(a), we show the numerical results of EE obtained 
from (21) versus the normalized cache capacity with different 
skew parameter S. We can see that the optimal cache capacity 
decreases with 5. With the same cache capacity, EE increases 
with 5. This is because the cache hit ratio increases with S 
as shown in (9). When 5 = 1, the EE gain of caching with 
optimized rj over not caching is about 350%. In Eig. 4(b), we 
show the numerical results of EE obtained from (21) versus 
the ratio of user density to BS density. We can see that EE 
first increases with quickly and then saturates gradually 
because the throughput is finally limited by ICI. Moreover, 
the EE increases more sharply when cache is enabled. This is 
because the throughput is increased and the backhaul power 
consumption is reduced by caching. When ^ is around one, 
which is typical for SCNs, the EE gain is about 230%. 

In Eig. 5, we show the numerical results of EE obtained 
from (26) versus the cell-edge SNR (which is controlled by 
changing the transmit power and hence reflects the impact 
of transmit power) and normalized cache capacity under 


unlimited-capacity backhaul and very stringent-capacity back¬ 
haul. As we analyzed in section IV-B, with a given cache 
capacity, the EE first increases with P and then decreases with 
P. We also plot the optimal transmit power Pq as a function 
of r] denoted as Po{r]), as well as the optimal normalized 
cache capacity rjo as a function of P denoted as r]o{P). We 
can see that Po{ri) increases with p slowly as we analyzed 
in Section IV-B, and r]o{P) increases with P slowly with 
very stringent-capacity backhaul. This implies that in a cache- 
enabled network with stringent-capacity backhaul, the value 
of transmit power has minor impact on the EE-optimal cache 
capacity and the value of cache capacity has little impact on 
the optimal transmit power. Besides, it is easy to find that the 
joint optimal values of p and P maximizing the network EE 
is the crossing point of r]o{P) and Poiv)- This means that 
(PojVo) satisfying both = 0 in (40) and = 0 in 
(35) are the joint optimal transmit power and cache capacity 
with the considered system setting, although the EE is not 
joint concave in P and i] as we analyzed in Section IV-B. 

D. Where to Cache Can Provide Higher EE? 

To illustrate where to deploy the caches can provide higher 
EE, we compare the throughput and EE achieved by caching 
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X 10® 



X 10® 



Normalized Cache Capacity ti Celi-edge SNR (dB) 

(b) Cbh —!■ 0 

Fig. 5. EE versus cell-edge SNR and normalized cache capacity, /3 = 0, 
<5 = 1. 

at the macro and pico BSs. For a fair comparison, we deploy 
three tiers of macro BSs similar to the pico network. The 
radius of each macro cell is 250 m, i.e., the coverage area of 
each macro cell is the same as that of iVf, = 37 pico cells. To 
ensure that the pico network and the macro network have the 
same total number of antennas and the same sum backhaul 
capacity within the same coverage area, each macro BSs is 
equipped with 4 x 37 antennas and the backhaul capacity 
for each pico BS and macro BS is 100 Mbps and 100 x 37 
Mbps. The power consumption parameters of the macro BS 
are p = 3.22, P = 46 dBm, Pcc, = 2.01 x 10^ W (13.6 
W per antenna), Pcc„ = 3.81 x 10^' W (25.8 W per antenna) 
[27]. If each BS caches Nc contents, the total cache capacities 
of the macro and pico networks will be NcF and Ni,NcF, 
respectively. In this simulation, we set the two networks with 
the same total cache capacity, hence each pico BS caches less 
contents. 

We can see from Fig. 6(a) that when the total cache 
capacity of the network is low, the throughput of the macro 
network is higher than the pico network due to higher backhaul 
capacity for each BS. When /3 = 1, the throughput of the 
macro network does not change with cache capacity, but the 




Fig. 6. Throughput and EE comparison between macro and pico networks, 
Nf = 10®. The throughput is evaluated within a region of 250 m radius 
including one macro cell and 37 pico cells. The curves stop when Nc = Nf, 
i.e, all contents aie cached at each BS. The curves of pico network stop earlier 
because each pico BS caches less contents than each macro BS because the 
two networks are set with identical total cache capacity. 


throughput of the pico network increases with cache capacity. 
This is because the backhaul capacity of each macro BS is 
large such that interference is the limiting factor of throughput, 
while the backhaul capacity of each pico BS network is low 
so that increasing cache capacity can relieve the backhaul 
congestion and hence increase the throughput. When there 
is no interference and /3 = 0, backhaul capacity becomes 
the bottleneck of both networks and thus their throughputs 
increase with cache capacity. We can see from Fig. 6(b) that 
the EE of the pico network is higher than the macro network 
since the pico BSs have more opportunities to idle and have 
low transmit and circuit powers although the cache capacity 
of each pico BS is smaller than each macro BS. The EE of 
the pico networks benefits more from caching, despite that 
more replicas of the same content are cached. This is because 
the backhaul capacity limits the throughput of each pico BS 
meanwhile the backhaul power consumption takes a large 
portion of the energy consumed in the pico network. 
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(a) An illustration of distributed caching. 



Fig. 7. Impact of user association with distributed caching and shadowing. 


E. Impact of User Association 

In the system model, we have assumed that each user is as¬ 
sociated with the closest BS, and hence caching most popular 
contents in each BS is optimal. Now we relax this assumption 
and consider a user association based on both location and 
content. As shown in (26), EE increases with the cache hit ratio 
Ph- To increase ph, we consider a distributed caching strategy 
where every three adjecent BSs cache different contents and 
each user associates with the nearest BS that caches the user’s 
requested contents. As illustrated in Eig. 7(a), the BS marked 
with “A” caches the 1st, 4th, 7th, • • •, {3Nc — 2)th popular 
contents, the BS marked with caches the 2nd, 5th, 8th, 
• • •, {3Nc — l)th popular contents, and the BS marked with 
“O” caches the 3rd, 6th, 9th, • • •, 3A^cth popular contents. 
This way of caching can reduce content redundancy by storing 
different contents in different BSs. Then, when each BS caches 
Nc contents with the distributed caching, each user can access 
to 3Nc cached contents, i.e, the equivalent cache capacity seen 
from each user can be regarded as three times over that with 
non-distributed caching. 

In Eig. 7(b), we show the simulation results of EE with 
distributed caching and non-distributed caching. We can see 
that when /3 = 0, i.e., no interference, distributed caching 
can achieve higher EE due to higher cache hit ratio. When 
/3 = 1, i.e., in the worst case of interference, distributed 
caching achieves lower EE than non-distributed caching. This 
is because each user may not always associate with the nearest 


BS with distributed caching and hence the nearest BS may 
generate strong interference to the user, which results in the 
EE reduction. When shadowing is considered and each user 
is associated to the BS with highest average channel gain, 
the network EE is slightly lower but the main trend of EE- 
cache relationship doed not change for both non-distributed 
and distributed caching. 

VI. Conclusion 

In this paper, we investigated whether and how caching 
at BSs can improve EE of wireless access networks. By 
analyzing the EE for the cache-enabled network, we found 
the condition of whether EE can beneht from caching, the 
EE-memory relation, and the maximal EE gain from caching. 
Analytical results showed that EE can be improved by caching 
at the BSs when power efficient cache hardware is used. A 
key observation is that the EE gain of caching comes from 
boosting the throughput, reducing the backhaul consumption 
and exploiting the content popularity when the backhaul is 
limited. The EE gain is large when the interference level 
is low, the backhaul capacity is stringent, and the content 
popularity distribution is skewed. Another key observation 
is that EE-memory relation is not a simple tradeoff. When 
the content catalog size is not very large, there is a tradeoff 
between EE and cache capacity. Otherwise, optimizing cache 
capacity of each BS can maximize the EE of the network. 
The EE-optimal cache capacity depends on the system setting, 
and decreases when the network becomes denser. Numerical 
and simulation results validated the analysis and showed that 
caching at pico BS can provide higher EE gain than caching at 
macro BS. Einally, we provided simulation results to illustrate 
that distributed caching will achieve much higher EE gain 
than simply caching popular contents everywhere if inter-cell 
interference can be successfully eliminated, but will be inferior 
to the simple caching policy if the interference can not be 
coordinated. 


Appendix A 
Proof of Lemma 1 

Considering that the SINRs for the users shown in (3) are 
identically distributed, R^aiKi,, Kf) can be derived as 


R,,{Kb,K,) = K,BE{log2 1 + 


KbiPh + f) 


(a) 


KaB E {log2 |hfftWfet,p} - log2 Kb + E{log2 “} 


- E log 2 [I31k + — 


= KcB f - Kb + 1)- log2 Kb 


+ log2 (j’fc “) ‘^dvkb - E |log2 + ^) } 


Nt-Kb + l 

log2-- 


2 In 2 


log2 D- 
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-E 




(A.l) 




/ aB 
1^2 In 2 


Re{Kb) 


(A. 6 ) 


where the approximation in step (a) is from omitting the term 
“1” inside the log function, which is accurate in high SINR 
region, step (&) comes from the facts that follows 

Gamma distribution G{Nt — frTf, + 1,1) [42] and is the 
probability density function (PDF) of r/^i, when the user is 
uniformly distributed in the circle cell, and step (c) is obtained 
by applying the asymptotic approximation of the Digamma 
function ip{n), i.e., 'ip{n) = ln(n) +0{-) « Inn [43] and the 
approximation is accurate when Nt — Kb + 1 > 1 . 

When the network is interference-limited, i.e., the interfer¬ 
ence power j3PIk ^ cR, 

E |log 2 (^Ph + ^)}~ E{log 2 (/ 34 )} (A.2) 

Considering the expression of defined in (3) and Ej^^} = 
1 ■ Pa + 0 ■ {1 - Pa) = Pa, we have 

E{log2(/34)} = E{log 2 (/fc£>“)} -f log2{PD~°‘) (A.3) 

where E{log 2 ( 4 £*“)} can be derived as 


E{log2(4f?“)} 


Nb 


D \ ^ 

= { log 2 I 51 0 IlhfcjW^. 


I ( / D \ ‘ 

< < log 2 51 


ihfc,W,|'2 


Nb 


= E, 




>»S 2 I E 


\j = '^,j¥=b 


= 4* +logs Pa =log2Pa2 


<s> 


l0g2 Pa 

(A.4) 


where the upper bound in step (o) is from using the Jensen’s 
inequality and the bound is tight when jA high (then pa ^ 1 
and hence Q —> E{y}), and 4) is a constant only depending 
on the path-loss exponent a when TVf, —3 oo (to be proved in 
Appendix B). By substituting (A.4) into (A.3) and then into 
(A.2), we obtain 

E |log 2 (^/34 + ' 4 ) } ^ log2(Pa/32^fJ-“) 

« log 2 (^Pa/32^iJ-“ + (A.5) 


where the approximation comes from the fact that when 
PPIk > cr^, we have Iog2(po/32^D““)^> E{log 2 (/ 34 )} > 
log 2 (^) which means pal32‘^D~°‘ » 

When the network is noise-limited, i.e., (3PIk ^ cr^, we 

alsohaveE{log 2(/34 + 7 ^)} « log 2 7 ^ « log2(pa/32'^D““-f 

^), which is the same as the result in (A.5).'^ 

By substituting (A.5) into (A.l), Raa{Kb,Kc) can be ap¬ 
proximated as 


RUKb, Ka) 


KaB 


a 

2 In 2 


+ log 2 


{Nt -Kb + 1)P \ 

Kb{paPP2^ + ) 


*^In section V-A, we use simulations to show that (A.5) is accurate for all 
values of /3 g [0,1]. 


where ReiKt) = Blog^ KbiPafn^^VD^a^) 

rived from E log 2 Hence, Rc{Kb) can 

be regarded as the average achievable rate of a cell-edge user 

when the backhaul capacity is unlimited and BSf, serves Kb 

users. 


Appendix B 

Proof of the constant $ when Nb ^ 00 

In the following, we first prove 4> only depends on a and 
Nb, and then prove 4) converges when Nb —> 00 . Without 
loss of generality, we assume the coordinate of BSh as ( 0 , 0 ). 
Denoting {xk,yk) and {uj,Vj) as the coordinate of MSfc 
and BSj, respectively, then Vkb = -f t/^ and rkj = 

y/{xk - Uj)'^ -f {pk - y)2. Denoting 4^ = ||hfcjWj |p and 
taking the expectation over user location in (A.4), we obtain 


4> = 




xl+yl<D^ 

D 


{ 4g2 


-P {yk - Vj)"^ 


/ Nb 

4i^ 1 dxkdpk (B.l) 


We normalize the coordinates of MS^ and BS^ with the cell 
radius D as {xk,yk) = i^,%) and (%,%) = 
respectively. After changing the integration variables as Xk 
and yk, (B.l) can be rewritten as 


$ = - 
TT 


{ l0g2 


n+yi<'^ 


' Nb 

E 

2 


((^fc ) “1“ (yfc '^j) ) ^kj I r (^-2) 


Since the normalized coordinates {xk,yk) and {uj,Vj) do not 
depend on D, and 4j is averaged over small-scale fading 
channel in (B.2), 4> only depend on a and Nb- 

By using the Jensen’s inequality in (B.2) to move the 
expectation into the log function and considering E{ 4 i} = 1 , 
we obtain 



((xk - uj)^ + {pk - y)^) 


dxkdpk (B.3) 


Considering a > 2 in practice and after some manipulations, 

we can show that Y.f=i,jb^b {i^k - y+ {yk - y)^) 
converges when Nb —3 00 . Therefore, 4> has an upper bound. 
Further considering 4> increases with Nt, 4> converges when 

Nb —)■ 00 . 
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Appendix C 
Proof of Lemma 2 

Consider that when Nf —)• oo, > 1 

and the variance of approaches to zero resulting 

from channel hardening [44]. Besides, when the interfer¬ 
ence power from each BS is independent and identically 
distributed (i.i.d.),'^ the interference power per BS = 

^ approaches to its expectation 

^E{Jfc} when A], — >■ oo according to the law of large 
numbers. This suggests that the distance between each user 
and its local BS r^b dominates the comparison between 
-Slog 2 (l + Ik) and Cbh when Nt is large, and 
therefore the second term in (10) can be approximated as 


Kb 


Rhh{Kb,Kc,Cbi,) = E<mmiB ^ log2(l + 7fc), C'bh 
I \ k=K^+i y 


Er 


min (b E Eh.r;,,,0 {log2(l + 7fc)}, C'bh^ I 

, \ fc=A-, + l / J 


(C.i) 


which is accurate as shown via simulations in Section V-A. 

By omitting the term “1” inside the log function, approxi¬ 
mating ip{n) by ln(n) similar to the derivation for (A.l), and 
further considering (A.5) and the definition of Rc{Kb), we 
have 


When Cbh > {Kb — Kc)Re{Kb), considering 


r iKb-K,)R,{Kb) + ^y, if y<z 

\ Cbh, if y>z 


(C.5) 


where 2 = ^ (Cbh - {Kb - K,)R,{Kb)), the RHS of (C.3) 
can be derived as 


^OO / /Ty R \ 

= J min \^{Kb - Kc)Rc{Kb) + C'bh j f{y)dy 

+ / Chhf{y)dy 

J z 

= {Kb-K,)(^^^{Kb-K, + l,z) 

+ R,{Kb)i{Kb - K„ z)^ + Cbhr(iTb- K„ z) (C.6) 
Combine (C.4) and (C.6), Lemma 2 is proved. 


Appendix D 

Proof of Proposition 1 

With Nc = 0 and ph = 0, from (26) the EE without 

caching can be obtained as EE„o = — „ , ,, —-«—. 

If EEno exceeds the EE with caching in (26), then with (9) 
we have 


Eh,r-fc,.0 { l0g2(l + 7fc)} « l0g2 


{Nt -Kb + l)P 


Kb{paPP2‘^D- 


a^) 


, , _Q Rc{Kb) ^ ^ y/-^^ 

+ log2 ^kb = - u — + “ — (C-2) 

D Vkb 


By substituting (C.2) into (C.I), we obtain 


Rbh{Kb, Kc, Cbh) « Er^^ < min {Kb - Kc)Re{Kb) 


+ ^ ^ 21n—,Cbh)| ^E,,ji?bh} (C.3) 

J) 

where we define i?bh to denote the term inside Er^^j-} in 
(C.3) for notation simplicity. 

With the PDE of r^t,, i.e., we can prove that 

{21n^,A: = I,-- - ,Kb,b = I,-- - ^Nb} are independent 
exponential distributed RVs with unit mean. Hence, the term 
y — J2k=K 4-1 ^ (C.3) is a Gamma distributed RV 

following G{Kb — K^, 1), i.e., it is positive, and the PDE of 
this term is f{y) = -ly ’ V P This gives rise to 

the following results. 

When Cbh < {Kb — Kc)Rc{Kb), i.e., the backhaul capacity 
is less than the average achievable sum-rate of all the cache- 
miss users under unlimited-capacity backhaul when they are 
located at the cell edge, the right hand side (RHS) of (C.3) 
becomes 

(C.4) 


E/'^.^ff^bh} — Cbh 


*^When the spatial distribution of the BSs also follows PPR the interference 
power from each BS is indeed i.i.d. [45]. 


Nf 

Wce,NcF^j~^ > 

i=i 

{{PaPa “t” (1 Pa)Pi)Rca P PaK^hhRcaRhh) Er' (D-i) 

/=! 

If (D.l) holds for A), = 1 , then 

Nf 

R^caP ^ C ^ {{PaPa P (1 Pa)Pi)Pca P PaR^hhPcaPhh) 
i=i 

(D.2) 

Multiplying both side of (D.2) by we obtain 

Nf 

WcaNcF^j~^ > 

i=i 

{{PaPa P {Nb - Pa)Pi)Rca P PaWhbiRcaRhh) Nc (D.3) 

Eurthering considering that Nc 
(D.3) turns into 

Nf 

WcaNcF^j~^ > 

j = l 

Na 

{{PaPa “t” (1 Pa)Pi)Rca P Pa'WhhRcaRhh) (d.4) 

/=! 

which is the same as (D.l). This suggests that if caching one 
content can not improve EE, then for any A), > 1 caching can 
not improve EE. Therefore, (D.2) is the condition of whether 
caching can increase EE. (D.2) can be rewritten as (27), and 
Proposition 1 is proved. 
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Appendix E 

Proof of Proposition 2 


n 


From I = 0 , we can obtain 

dr ] \ ri=rio VoNj 

— In Nf — 1. Adding In Q on both sides of the equation, 


In — 

VoNf 


Rea R]_ 

we obtain 

n 


In 


n 


rioNf rjoNf 


= In r 2 


-^bh 


Rea Rhh 


\nNf-l (E.l) 


Taking the exponential of both sides of (E.l), we have 


3 TO"/ = 


^bh 


\n Nf — 1 


rjoNf 

= X, we obtain 


Since W{x) satisfies 


n 


VoNf 


^bh 




InNf-l 


(E.2) 
> 0 when ry < 


Since + In decreases with n, 

rjo and < 0 when 77 > po- Rewriting (E.2) as (31) and 

further considering ry < 1, Proposition 2 can be proved. 


Appendix F 

Prooe oe Corollary 4 

Denote = c, where c is a constant. Substituting 

1 — 

D = and po = 1 — e "<> into (35) and then taking 

the derivation of rjo in (35) with respect to N],, we obtain 


-WhhB 


j^e "i. + a (1 - e 


(^1-e A) 


dyo _ _ 

dNh 2Ni,WcaFNf in Nf in 2 ^ 2 ^ ]ja _ g“ ^ 

^ (F.l) 


Nr 


A , 

+ "6 a - 2 + log2- 

I P«/32*+ (,*,) II 

Since the path-loss exponent a > 2, we have < 0, i.e., 
ryo decreases with Nr,. 

When —> 0, we have = 1 — IW’ '^ben from 

(35), po -^6 can be expressed as 

AtUbh-B 


Nr 


VoNb ^ log 2 —-7--—-—- 

w,,FNf\nNf + 


-1 


(F.2) 


from which we can see that yoNb increases with Nb- 


Appendix G 
Prooe oe Corollary 7 


By substituting Pa/3P2^ ^ D°'a^ into (39) and letting 
^lp=p = 0 , we obtain 


Fee + Pc: 


PapPo 



NtPo 



= 0 


(G.l) 


where P^c = PaPcc^ + i^—Pa)Pcci is the average circuit power 
consumption of each BS, and Pea = WcapNjF is the average 
cache power consumption of each BS. From this equation we 
can derive (40). Since in practice the path-loss exponent a > 
2, In + f - 1 > 0 and the left hand side (LHS) of (G.l) 
decreases with Pq. Therefore, > 0 when P < Pq and 
dPF < 0 when P > Pq, which indicates that Pq is the optimal 
transmit power maximizing the network EE. 
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